ABSTRACT

Diabetes or Diabetes Mellitus (DM) is noxious diseases in the world. Diabetes is caused by obesity or high blood glucose level, lack of exercise and so forth. It can be manage if it’s detected at early state. Machine learning is the construction of computer system or program that can adapt and learn from their experience. PIMA dataset is used in this research works. The dataset contains some 9 attributes of 768 patients. There are different kinds of machine learning algorithms but in this research works we choose three algorithms which are under supervised learning. The algorithms are Logistic regression, Decision tree and Random forest. Each of these algorithms model were trained and tested. We later use some measure to compare and analyze the performance of the machine learning algorithms. The performance measures used are Accuracy, F-measure, Recall and Precision. Logistic Regression has the highest accuracy score which is 77%, also have the highest precision score 0.77 and have the highest f-measure 0.64. Decision Tree has the highest recall score 0.58.

Keywords: - Diabetes, Machine learning, Logistic Regression, Decision tree, Random forest